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ABSTRACT 

We estimated photometric redshifts (Zphot) for more than 1.1 million galaxies of the ESO Pub¬ 
lic Kilo-Degree Survey (KiDS) Data Release 2. KiDS is an optical wide-held imaging survey 
carried out with the VLT Survey Telescope (VST) and the OmegaCAM camera, which aims 
at tackling open questions in cosmology and galaxy evolution, such as the origin of dark 
energy and the channel of galaxy mass growth. We present a catalogue of photometric red¬ 
shifts obtained using the Multi Layer Perceptron with Quasi Newton Algorithm (MLPQNA) 
model, provided within the framework of the DAta Mining and Exploration Web Application 
REsource (DAMEWARE). These photometric redshifts are based on a spectroscopic knowl¬ 
edge base which was obtained by merging spectroscopic datasets from GAMA (Galaxy And 
Mass Assembly) data release 2 and SDSS-III data release 9. The overall Icr uncertainty on 
Az = {Zspec - Zphot)l{^ + Zspec) IS ~ 0.03, with a very small average bias of ~ 0.001, a NMAD 
of ~ 0.02 and a fraction of catastrophic outliers (|Az| > 0.15) of ~ 0.4%. 
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try 


1 INTRODUCTION 

Photometric redshifts (Zphot) derived from multi-band digital 
surveys are crucial to a variety of cosmological appli c ations 
(|Scranton_et_^ Myers et si] bood ; iHennawi et al.l 1 20061 : 

lOiannantonio et al. IbOOSh . In the last ye ars a plethora of method s 
has been developed to estimate Zpi,ot (cf. iHildebrandt et alJbOlOh . 
but the advent of a new gen eration of ph otometr ic surveys (to 
quote just a few, Pan-STARRS: Kaiserll20M : Euclid: lUaureiis et al.l 
I 2 OI ll; KiPd^ Ide Jong et aP l2013h ^mands for higher accuracy 
I Br^cia et al.ll2014^ 

The evaluation of photometric redshifts requires the mapping 
of the photometric space into the spectroscopic redshift space. All 
methods, one way or the other, require the use of a Knowledge 
Base (KB) consisting in a set of templates, and differ mainly in 
the following aspects; (i) the way in which the KB is constructed 
(spectroscopic redshifts or, rather, empirically or theoretically de¬ 
rived spectral energy distributions or SEDs), and (ii) the adopted 
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interpolation/fitting algorithm. Methods based on the interpolation 
of a spectroscopic KB are usually labeled as empirical. 

Many different implementations of empirical methods exist 
and w e shall recall just a few: i) polynomial fitting l|Connollv et al.l 


I polynomial ht 
JCsabai etalJ 


2003): Hi) neural 


Il995h : ii) nearest neighbo rs , _ 

networks l lD’Abrusco et ^ I2OO7I: lYeche et al.1 l20m and refer¬ 
ences therein): iv) support vector mach ines jWadadekarl l2005h : 


v) regression trees jCarliles et al.| | 2010l): vi\ gaussian processes 
Bon field et alj|20f^ . and vii) diffusion 


dWav & Srivastaval 20061: 


maps dFreeman et al.ll200^ . 

In this paper we discuss the derivation of photometric redshifts 
fo r the galaxies in th e Kilo-Degree Survey (KiDS) data release 
2 dde Jong et alJl201^ . Ki DS is an ESO public survey, based on 
the VLT Survey Tele scope 1 Ca paccioli & SchipanilbOl ih with the 
OmegaCAM camera dKuiikenIbOllh . that will image 1500 square 
degrees in four filters (u, g, r, i), in single epochs per filter. The high 
spatial resolution of VST {0.2" /pixel), the photometric depth and 
area covered make it a front-edge tool for weak gravitational leas¬ 
ing and galaxy evolution studies. The measurement of unbiased and 
high-quality Zphot is a crucial step to pur sue many of the sci entific 
goals which motivated the KiDS survey dde Jong et al.ll201^ . 

We present the Zphot for a sample of ~ 1.1 million galaxies. 
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These redshifts were derived with the Multi Layer Perceptron with 
Quasi Newton Algorithm (MLPQNA) m ethod, described in de¬ 
tail elsewhere terescia et alJl20l'^ . l2013h . hence we refer to those 
articles for all the mathematical and technical details. Recently, 
this method has been also used to derive a catalogue of Zphot for 
the entire SDSS-DR9 ( [Brescia et al.l[2014bl) . In the PHATl con¬ 
test dHildebrandt et al.ll2010l) , which blindly compared most exist¬ 
ing methods to estimate Zphot on a very limited KB (~ 500 ob¬ 
jects only), tbe MLPQN A method proved to be one of the best em¬ 
pirical methods to date dCavuoti et alJbOlil) . It is however worth 
noticing that in the PHATl contest, MLPQNA did not perform as 
well as many SED fitting methods, due to the very limited base 
of knowledge available. This situation reverses when significantly 
larger KB’s pro perly sampling the photome tric parameter space be¬ 
come available dBrescia et alfcoillTOUbh . 

The MLPQNA model is publicly available in the DAta Mining 
& Expl oration Web Applica tion REsource infrastructure (DAME- 
WARE ; iBrescia et al.l20143i and has also been im plemented in the 
PhotoRaptor service package dCavuoti et al.ll201^ . 

The paper is organized as follows. In Sect. we present the 
dataset, while in Sect. the experiments and related outcome are 
described and discussed. In Sect. |4] we give a description of the 
resulting photometric redshift catalogue. Einally, in Sect.[5]we draw 
our conclusions and future prospects. 


2 THE DATA 

The sample of galaxies for which we provide Zphot was extracted 
from the second data release of the ESO Public Kilo-Degree Survey 
(KiDS-ESO-DR2). A detailed descr iption of all the steps followed 
to extract the catalogues is given in ide .long et alJ ll2015h . KiDS is 
a wide-area optical imaging survey in the four filters (u, g, r, i), 
carried out with the VLT Survey Telescope (VST) and the Omega- 
CAM camera. The KiDS observation strategy consists of a standard 
diagonal dithering pattern (5 dithers in g, r, i and 4 in u-band) in 
order to minimize the effect of the inter-CCD gaps in the Omega- 
CAM science array. Therefore the final footprint of e ach single tile 
is slig htly larger than the nominal 1 square degree l lde Jong et al.l 
11 ^. 

The data processing pr ocedure used is based on the Astro- 
WISE (AW) optical pipeline jMcFarland et alJ2013h . After the first 
basic data reduction steps (such as cross-talk, de-biasing and over¬ 
scan correction, flat-fielding, illumination correction, de-fringing, 
pixel masking, satellite-track removal and background subtraction), 
the pipeline performs photometric and astrometric calibrations. 

Source extractio n is based on a task provided in the AW envi¬ 
ronment, KiDS-CAT llde .long et al.l201^. based on algorit hms de¬ 
veloped for the software 2DPHOT dLa Barbera et alj2008t) . KiDS- 
CAT automatically performs a seeing assessment of the image, us¬ 
ing best-quality stars in the image , and subsequently optim izes the 
configuration files of SExtractor l lBertin & Arnoutslll99d) to per¬ 
form the source extraction in the individual bands. In this pro¬ 
cess, besides the photometric flag provided by SExtractor, de¬ 
tected sources are also flagged according to their proximity to 
star spikes and haloes (IMAFLAGS_ISO flag), which are identi- 
fied in the KiDS im ages through a dedicated masking procedure 
jde Jong erZ||2015l see also Huang et al. in preparation). 

In order to derive our photometric redshifts, we used the multi¬ 
band source catalogues, which rely on the double-image mode of 
SExtractor. These catalogues are based on source detection in the 
r-band images. While magnitudes are measured in all filters, the 


Star-Galaxy separation, as well as the source positional and shape 
parameters, are based on the r-band data only. The choice of the 
r-band as a reference is motivated by the fact that it is observed 
under the best seeing conditions (~ 0.7" seeing FWHM, on av¬ 
erage), and therefore it typically has the best image quality, thus 
providing the most reliable source positions and shapes. Aperture 
photometry in the four bands within several aperture radii, together 
with MAG_AUTO, shape parameters and flags, are available from 
SExtractor and KiDS-CAT. In the final catalogue, in order to max¬ 
imize the sample with Zphot estimates available, we have retained 
~ lO’ sources with r-band SExtractor flag FLAGS_r < 4 and 
rejected ~ 2 x 10^ objects having close and bright companion 
sources, a ffected by bad pixels or originally blended with other ob¬ 
jects (see lBertin & Amoutslll99^ for a detailed description of ex¬ 
traction flags). The limiting magnitudes of KiDS catalogue^ at the 
Icr level are: 


• MAGAP_4_u = 25.17 

• MAGAP_6_u = 24.74 

• MAGAP_4_g = 26.03 

• MAGAP_6_g = 25.61 

• MAGAP_4_r = 25.89 

• MAGAP_6_r = 25.44 

• MAGAP_4_i = 24.53 

• MAGAP_6_i = 24.06 


KiDS DR2 contains 148 tiles observed in all filters during the 
first two years of operations. From the original catalogue of ~ 18 
millions of sources, the Star/Galaxy separation leaves ~ 10 million 
galaxies, of which ~ 6 million have null IMAFLAGS_ISO in all 
the filters, i.e. they are observed in unmasked regions. Out of these, 
we succeeded in estimating Zp^ot for 1,142,992 sources (see Sec.|4] 
for details). 

In order to build the needed spectroscopic knowledge base, the 
KiDS galaxy sample was matched with two independent spectro¬ 
scopic surveys: GAMA (Galaxy And Mass Assembly) and SDSS 
(Sloan Digital Sky Survey). The final spectroscopic sample was ob¬ 
tained by merging data fro m GAMA data rele ase 2 (112fe new red¬ 
shifts in the first three years. [Driver et al.| 201 1, Liske et al. in prep.) 
and SDSS-III data release 9 jAhn et al.l 20121 l2014l : iBolton et al.l 
l2012tlOien et al.ll2012l) . The redshift distribution of the mixed cat¬ 
alogue is shown in Fig.[T] 

GAMA observes galaxies out to z = 0.5 and r < 19.8 (r-band 
petrosian magnitude), by reaching a spectroscopic completeness of 
98% for the main survey targets. It provides also information about 
the quality of the redshift determination by using the probabilis¬ 
tically defined normalised redshi ft quality scale nQ. The redshifts 
with n Q > 2 are the most reliable dPriver et al. l201lT : [Hopkins et^ 

I2OI3I) . 


For the SDSS-III we used the low z (LOWZ) and constant 
mass (CMASS) samples of the Baryon Oscillation Sky Survey 
(BOSS). The BOSS project aims to obtain spectra (redshifts) for 
1.5 million luminous galaxies up to z ~ 0.7. The LOWZ sample 
consists of galaxies with 0.15 < z < 0.4 with colors similar to 
those of luminous red galaxies (LRGs) at z <: 0.4. Objects were 
selected by applying suitable cuts on magnitudes and colors with 
the aim of extending the SDSS LRG sample towards fainter magni - 
tudes/higher redshifts (see e.g. lAhn et al.l2012l:[Bolton et al.l2012l) . 


^ We use the MAGAP_4 and MAGAP_6 magnitudes, measured within cir¬ 
cular apertures of 4" and 6" of diameter, respectively. These magnitudes are 
provided within the produced Zphot catalogue. 


















































The CM ASS sample contains three times more galaxies than 
the LOWZ sample, and it was designed to select galaxies in the 
range 0.4 < z < 0.8. The rest-frame color distrihution of the 
CMASS sample is significantly broader than that of the LRG one, 
thus CMASS contains a nearly complete sample of massive galax¬ 
ies down to log Af*/AfQ ~ 11.2. The faintest galaxies are at r = 19.5 
in the LOWZ sample and i = 19.9 in the CMASS one. 

Our spectroscopic sample is therefore dominated by galaxies 
from GAMA (46,603 vs. 1,618 from SDSS) at low-z (z < 0.4), 
while SDSS galaxies dominate the higher redshift regime (out to 
z ~ 0.7), with r < 22. 


3 EXPERIMENTS AND DISCUSSION 

Dealing with machine learning supervised methods, it is common 
practice to select and use the available KB to build a minimum of 
three disjoint data subsets: (i) a data set to train the method looking 
for the correlation hidden in the photometric information among 
the input features necessary to perform the regression (known as 
training set); (ii) a validation set to be used to check and verify the 
training performance against a loss of generalization capabilities 
(a phenomenon also known as ovetfitting); (Hi) finally, a test set 
needed to blindly evaluate the overall performances of the model 
with data samples never submitted to the model before. 

In this work, the validation process was embedded into the 
training phase, by appl ying the stand ard leave-one-out k-fold cross 
validation mechanism ( lGeisseij|l975h . We would like to stress that 
none of the objects included in the training (and validation) sample 
were included in the test sample and only the test data were used to 
generate the statistics and scatter plots. 

We created training and test samples with relative sizes of 60% 
(36,222 objects) and 40% (24,150 objects) by randomly drawing 
without replacement from the KB. The histogram in Fig. [T] shows 
the distribution of the KB as a function of the Zspec in both the train¬ 
ing and test sets, while in Fig.|2]the distribution of Zspec and Zphot in 
the blind test set is shown. As it can be seen, the three distributions 
are in almost perfect agreement. 

The results were evaluated using a standard set of statistical 
indicators applied to the quantity Az = '’"T : 

J +Zspec 

• the bias, defined as the mean value of the residuals Az; 

• the standard deviation (cr) of the residuals; 

• the normalized median absolute deviation or NMAD of the 
residuals, defined as NMAD{Az) = 1.48 x Median (|Az|). 

As input photometric parameters (or features) we used the 
MAGAP_4 and MAGAP_6 aperture magnitudes (m, g, r, i), a 
choice which was based on our past experience, sin ce almost al¬ 
ways this com bination lead to the best performances jBrescia et al.l 
l201^l2014bl) . However, it needs to be emphasized that an improve¬ 
ment in the performances of a machine learning method can be 
expected from an exhaustiv e exploration of the parameter space 
through feature selection (cf. lPolsterer et alj|2014l) . This approach, 
however, is usually too much demanding in terms of computing 
time. 

MLPQNA Zphot are in excellent agreement with Zspec, as we 
show in Figs.l^andlH where the results of the experiment are sum¬ 
marized. The upper panel of Fig. shows the predicted photomet¬ 
ric redshift estimates versus the spectroscopic redshift values for all 
objects in the blind test set. In the lower panel of Fig.[3]the Zspec is 
plotted vs. the residuals Az. The underpopulated redshift bins, vis¬ 
ible in Fig. [3 reflect the distribution of the spectroscopic sample 
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which is less populated at redshift ~ 0.22 and 0.42 (see Fig. [T] 
and Fig. 13 . 

In Fig. |4] we show the distribution of residuals which has a 
kurtosis of 1.8 and a skewness of 7.07 x lO^'*", i.e. a leptokurtic and 
symmetric distribution, as a lready found in the SD SS-DR9 case by 
applying the same method jBrescia et al.ll20145l . In other words, 
the distribution reveals an over-density of objects in its central re¬ 
gion (i.e. objects with a small error), which also reflects on the very 
low percentage of outliers and a low NMAD value (see below). 

Overall, we find a bias of 9.9 x 10“'*, a standard deviation of 
0.0305 and a NMAD of 0.021. The o-jg (i.e. the range in which 
falls the 68% of the residuals) is 0.022, smaller than the standard 
deviation, as it has to be expected from a leptokurtic distribution. 
Moreover, our method leads to a very small fraction of outliers, i.e. 
less than 0.39% and 3.30% using the |Az| > 0.15 and |Az| > 2cr 
criteria, respectively. If we refer to the sample of objects with 
IMAFLAGS _ISO = 0, the bias, standard deviation and NMAD 
become 0.00072, 0.0288 and 0.0207, respectively. While the frac¬ 
tion of outliers is of 0.32% and 3.26%. Thus, although the present 
approach is quite immune to systematic effects in photometry, we 
find a small improvement in the statistics when the sources in the 
masked regions are removed from the analysis. 


4 THE PHOTOMETRIC CATALOGUE 

To produce the final Zphot catalogue, we initially considered the 
multi-source KiDS catalogue, i.e. sources detected in r-band and 
measured in all KiDS bands. However, it is important to underline 
that all empirical Zphot prediction methods suffer from a poor ca¬ 
pability to extrapolate outside the range of distributions imposed 
by the training sample. In literature, several approaches have been 
proposed to extend the applicability test of empirical methods out¬ 
side the boundaries of th e parameter space p r operly sampled b y 
the spectroscopic KB (c.f. lVanzella et al]|2004lHoyle et alj|2015l) . 
While useful in some cases, this artificial augmentation of the KB 
introduces a further level of complexity and leads to statistical bi¬ 
ases which are difficult to evaluate and control. 

In the available spectroscopic KB we found that ~ 99% of the 
KB objects falls within the following region of the parameter space: 

• MAGAP_4_u<25.1 

• MAGAP_6_u < 24.7 

• MAGAP_4_g < 24.5 

• MAGAP_6_g < 24.0 

• MAGAP_4_r < 22.2 

• MAGAP_6_r < 22.0 

• MAGAP_4_i < 21.5 

• MAGAP_6_i < 21.0 

Hence, to produce the final Zphot catalogue, we have removed 
all the objects that do not match the above criteria in more than one 
band. The choice to retain objects with only one band not matching 
the above criteria was dictated by the need to maximize the num¬ 
ber of objects with a redshift estimate and supported by the well 
tested robustnes s of the MLPQNA m ethod against non detections 
or missing data jCavuoti et al.l 2012h . In Table[T]we report the sta¬ 
tistical indicators evaluated for two groups of objects: those having 
all data points falling within the above region (clean objects) and 
those (contaminated objects) with only one band falling outside of 
it. It appears evident that for a one-band failure there is only a small 
decrease of performance. 

However, in order to keep track of this effect, we include a Zphot 
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Figure 1. Spectroscopic redshift distribution of objects included in the training set (black line) and test set (gray line) normalized to the splitting rate. 



Figure 2. Redshift distribution of objects included in the blind test set, spectroscopic (black line) and photometric (gray line). 


Subset 

\bias\ 

(T 

NMAD 

Outliers % 

Outliers % 





|Az| > 0.15 

|Az| > 2(t 

clean 

0.0011 

0.0303 

0.0212 

0.38 

3.13 

contaminated 

0.0003 

0.0339 

0.0223 

0.47 

5.80 


Table 1. Statistical indicators computed for two different subsets of the 
blind test set. The clean set includes only data for which the photometry 
falls within the limits listed in Sec.|4] The contaminated subset includes the 
objects which fall outside those limits in only one band. 

quality flag in the catalogue, set to 1 for best quality (i.e. clean) and 
0 for worse quality (i.e. contaminated) objects. 

The final Zpi,ot catalogue consists of 1,142,992 objects 
(699, 155 objects have all IMAFLAGSJS 0 = 0 and 710,127 with 
best quality). 


5 CONCLUSIONS 

In this work we applied the MLPQNA neural network to the ESO 
KiDS DR2 photometric galaxy data, using a knowledge base de¬ 
rived from the SDSS and GAMA spectroscopic samples, to pro¬ 
duce a catalogue of photometric redshifts based on optical photo¬ 
metric data only. We obtained an overall Icr uncertainty on Az = 
(Zspec - ZphodIO + Zspec) of 0.0305 with a very small average bias 
of 9.9 X 10“^^, a low NMAD of 0.021, and a low fraction of outliers 
(0.39% above the standard limit of 0.15). 

The trained network was then used to process all galaxies in 
the data set that populate a parameter space similar to that defined 
by the SDSS-l-GAMA spectroscopic sample, producing Zphot esti¬ 


mates for about 1.1 million KiDS galaxies. The catalogue will be 
made available on CDS VizieR facility. 

Deriving photometric redshifts is an essential task when deal¬ 
ing with large samples of galaxies, such as that expected from the 
KiDS photometric survey. These redshifts are currently being used 
by the Kids collaboration for a variety of studies regarding the evo¬ 
lution of galaxy stellar masses, integrated colours, colour gradi¬ 
ents and the structural parameters with redshift (Napolitano et al. 
in prep.). The characterization of how completeness and biases of 
the photo-z catalogue affect the final scientific goals is therefore 
postponed to later works. This types of studies will allow us to bet¬ 
ter constrain the processes leading to the (mass) growth of galaxies 
in the last half of the current age of the universe. 


6 ACKNOWLEDGEMENTS 

The authors would like to thank the anonymous referee for ex¬ 
tremely valuable comments and suggestions. Based on data prod¬ 
ucts from observations made with ESO Telescopes at the La Silla 
Paranal Observatory under programme IDs 177.A-30I6, 177.A- 
3017 and 177.A-3018, and on data products produced by Tar- 
get/OmegaCEN, INAE-OACN, INAE-OAPD and the KiDS pro¬ 
duction team, on behalf of the KiDS consortium. OmegaCEN and 
the KiDS production team acknowledge support by NOVA and 
NWO-M grants. Members of INAE-OAPD and INAE-OACN also 
acknowledge the support from the Department of Physics & As¬ 
tronomy of the University of Padova, and of the Department of 
Physics of Univ. Federico II (Naples). The authors would like to 
thank Amata Mercurio, Joachim Hamois-Deraps and Peter Schnei¬ 
der for the very useful comments. CT has received funding from 
the European Union Seventh Framework Programme (FP7/2007- 





























^phot) f {^spec “1“ 1) 


Zphot in KiDS-ES0-DR2 5 



‘Spec 


Figure 3. Upper panel: spectroscopic versus photometric redshifts for ob¬ 
jects of the blind test set. Lower panel: spectroscopic redshift versus (zspec- 
Zphot)/(l+Zspec) for the same objects. 
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